[컴퓨터구조론] 6. Storage and Other I/O Topics
2020-10-11
본 글은 영남대학교 최규상 교수님의 컴퓨터 구조 강의를 듣고 작성된 글입니다.
6.1 Introduction
-
Introduction
-
I/O System Characteristics
-
Dependablility is important
- Particularly for storage devices
-
Performance measures
- Latency (response time)
- Throughput (bandwidth)
-
Desktops & embedded systems
- Mainly interested in response time & diversity of devices
-
Servers
- Mainly interested in throughput & expandability of devcies
-
6.2 Dependability, Reliability, and Availability
-
-
Fault: failure of a component
- May or may not lead to system failure
-
-
Dependability Measures
- Reliability: mean time to failure (MTTF)
- Service interruption: mean time to repair (MTTR)
-
Mean time between failures
- MTBF = MTTF + MTTR
- Availability = MTTF / (MTTF + MTTR)
-
Improving Availability
- Increase MTTF: fault avoidance, fault tolerance, fault forecasting
- Reduce MTTR: improved tools and processes for diagnosis and repair
6.3 Disk Storage
-
Disk Storage
-
Disk Sectors and Access
-
Each sector records
- Sector ID
- Data (512 bytes, 4096 bytes proposed)
-
Error correcting code (ECC)
- Used to hide defects and recording errors
- Synchronization fields and gaps
-
Access to a sector involves
- Queuing delay if other accesses are pending
- Seek: move the heads
- Rotational latency
- Data transfer
- Controller overhead
-
-
Disk Access Example
-
Given
- 512B sector, 15,000rpm, 4ms average seek time, 100MB/s transfer rate, 0.2ms controller overhead, idle disk
-
Average read time
4 ms seek time + (1/2) / (15,000/60) = 2ms rotational latency + 512 / 100MB/s = 0.005ms transfer time + 0.2ms controller delay = 6.2ms
-
If actual average seek time is 1ms
- Average read time = 3.2ms
-
-
Disk Performance Issues
-
Manufacturers quote average seek time
- Based on all possible seeks
- Locality and OS scheduling lead to smaller actual avaerage seek times
-
Smart disk controller allocate physical sectors on disk
- Present logical sector interface to host
- SCSI, ATA, SATA, SAS
-
Disk drives include caches
- Prefetch sectors in anticipation of access
- Avoid seek and rotational delay
-
6.4 Flash Storage
-
Flash Storage
-
Flash Types
-
NOR flash: bit cell like a NOR gate
- Random read/write access
- Used for instruction memory in embedded systems
-
NAND flash: bit cell like a NAND gate
- Denser (bits/area), but block-at-a-time access
- Cheaper per GB
- Used for USB keys, media storage, ...
-
Flash bits wears out after 10000's of accesses
- Not suitable for direct RAM or disk replacement
- Wear leveling: remap data to less used blocks
-
6.5 Connecting Processors, Memory, and I/O Devices
-
Interconnecting Components
-
Need interconnections between
- CPU, memory, I/O controllers
-
Bus: shared communication channel
- Parallel set of wires for data and synchronization of data transfer
- Can become a bottleneck
-
Performance limited by physical factors
- Wire length, number of connections
-
More recent alternative: high-speed serial connections with switches
- Like networks
-
-
Bus Types
-
Processor-Memory buses
- Short, high speed
- Design is matched to memory organization
-
I/O buses
- Longer, allowing multiple connections
- Specified by standards for interoperability
- Connect to processor-memory bus through a bridge
-
-
Bus Signals and Synchronization
-
Data lines
- Carry address and data
- Multiplexed or separate
-
Control lines
- Indicate data type, synchronize transactions
-
Synchronous
- Uses a bus clock
-
Asynchronous
- Uses request/acknowledge control lines for handshaking
-
- I/O Bus Examples
- Typical x86 PC I/O System
6.6 Interfacing I/O Devices to the Processor, Memory, and Operating System
-
I/O Management
-
I/O is mediated by the OS
-
Multiple programs share I/O resources
- Need protection and scheduling
-
I/O causes asynchronous interrupts
- Same mechanism as exceptions
-
I/O programming is fiddly
- OS provides abstractions to programs
-
-
-
I/O Commands
-
I/O devices are managed by I/O controller hardware
- Transfers data to/from device
- Synchronizes operations with software
-
Command registers
- Cause device to do something
-
Status registers
- Indicate what the device is doing and occurrence of errors
-
Data registers
- Write: transfer data to a device
- Read: transfer data from a device
-
-
I/O Register Mapping
-
Memory mapped I/O
- Registers are addressed in same space as memory
- Address decoder distinguishes between them
- OS uses address translation mechanism to make them only accessible to kernel
-
I/O instructions
- Seperate instructions to access I/O register
- Can only be executed in kernel mode
- Example: x86
-
-
Polling
-
Periodically check I/O status register
- If device ready, do operation
- If error, take action
-
Common in small or low-performance real-time embedded system
- Predictable timing
- Low hardware cost
- In other systems, wastes CPU time
-
-
Interrupts
-
When a device is ready or error occurs
- Controller interrupts CPU
-
Interrupt is like an exception
- But not synchronized to instruction execution
- Can invoke handler between instructions
- Cause information often identifies the interrupting device
-
Priority interrupts
- Devices needing more urgent attention get higher priority
- Can interrupt handler for a lower priority interrupt
-
-
I/O Data Transfer
-
Polling and interrupt-driven I/O
- CPU transfers data between memory and I/O data registers
- Time consuming for high-speed devices
-
Direct memory access (DMA)
- OS provides starting address in memory
- I/O controller transfers to/from memory autonomously
- Controller interrupts on completion or error
-
-
DMA/Cache Interaction
-
If DMA writes to a memory block that is cached
- Cached copy becomes stale
-
If write-back cache has dirty block, and DMA reads memory block
- Reads stale data
-
Need to ensure cache coherence
- Flush blocks from cache if they will be used for DMA
- Or use non-cacheable memory locations for I/O
-
-
DMA/VM Interaction
-
OS uses virtual addresses for memory
- DMA blocks may not be contiguous in physical memory
-
Should DMA use virtual addresses?
- Would require controller to do translation
-
If DMA uses physical addresses
- May need to break transfers into page-sized chunks
- Or chain multiple transfers
- Or allocate contiguous physical pages for DMA
-
6.7 I/O Performance Measures: Examples from Disk and File Systems
-
Measuring I/O Performance
-
I/O performance depends on
- Hardware: CPU, memory, controllers, buses
- Software: operating system, database, management system, application
- Workload: request rates and patterns
-
I/O system design can trade-off between response time and throughput
- Measurements of throughput often done with constrained response-time
-
-
Transaction Processing Benchmarks
-
Transactions
- Small data accesses to a DBMS
- Interested in I/O rate, not data rate
-
Measure throughput
- Subject to response time limits and failure handling
- ACID (Atomicity, Consistency, Isolation, Durability)
- Overall cost per transaction
-
Transaction Processing Council (TPC) benchmarks (www.tcp.org)
- TPC-APP: B2B application server and web services
- TCP-C: on-line order entry environment
- TCP-E: on-line transaction processing for brokerage firm
- TPC-H: decision support - buisiness oriented ad-hoc queries
-
-
File System & Web Benchmarks
-
SPEC System File System (SFS)
- Synthetic workload for NFS server, based on monitoring real systems
-
Results
- Throughput (operations/sec)
- Response time (average ms/operation)
-
SPEC Web Server benchmark
- Measures simultaneous user sessions, subject to required throughput/session
- Three workloads: Banking. Ecommerce, and Support
-
-
I/O vs. CPU Performance
6.8 Designing an I/O System
-
I/O System Design
-
Satisfying latency requirements
- For time-critical operations
-
If system is unloaded
- Add up latency of components
-
Maximizing throughput
- Find "weakest link" (lowest-bandwidth component)
- Configure to operate at its maximun bandwidth
- Balance remaining components in the system
-
If system is loaded, simple analysis is insuffcient
- Need to use queuing models or simulation
-
6.9 Parallelism and I/O: Redundant Arrays of Inexpensive Disks
-
RAID
-
Redundant Array of Inexpensive (Independent) Disks
- Use multiple smaller disks (c.f. one large disk)
- Parallelism improves performance
- Plus extra disk(s) for redundant data storage
-
Provides fault tolerant storage system
- Especially if failed disks can be "hot swapped"
-
RAID 0
-
No redundancy ("AID"?)
- Just stripe data over multiple disks
- But it does improve performance
-
-
-
RAID 1 & 2
-
RAID 1: Mirroring
-
N + N disks, replicate data
- Write data to both data disk and mirror disk
- On disk failure, read from mirror
-
-
RAID 2: Error correcting code (ECC)
- N + E disks (e.g., 10 + 4)
- Split data at bit level across N disks
- Generate E-bit ECC
- Too complex, not used in practice
-
-
RAID 3: Bit-Interleaved Parity
-
N + 1 disks
- Data striped across N disks at byte level
- Redundant disk stores parity
-
Read access
- Read all disks
-
Write access
- Generate new parity and access all disks
-
On failure
- Use parity to reconstruct missing data
- Not widely used
-
-
RAID 4: Block-Interleaved Parity
-
N + 1 disks
- Data striped across N disks at block level
- Redundant disk stores parity for a group of blocks
-
Read access
- Read only the disk holding the required block
-
Write access
- Just read disk containing modified block, and parity disk
- Calculate new parity, update data disk and prity disk
-
On failure
- Use parity to reconstruct missing data
- Not widely used
-
- RAID 3 vs RAID 4
-
RAID 5: Distributed Parity
-
RAID 6: P + Q Redundancy
-
N + 2 disks
- Like RAID 5, but two lots of parity
- Greater fault tolerance through more redundancy
-
Multiple RAID
- More advanced systems give similar fault tolerance with better performance
-
-
RAID Summary
-
RAID can improve performance and availability
- High availability requires hot swapping
-
Assumes independent disk failures
- Too bad if the building burns down!
-
See "Hard Disk Performance, Quality and Reliability"
-
6.10 Real Stuff: Sun Fire x4150 Server
- pass
6.12 Fallacies and Pitfalls
-
Fallacy: Disk Dependability
-
If a disk manufacturer quotes MTTF as 1,200,000hr (140yr)
- A disk will work that long
-
Wrong: this is the mean time to failure
- What is the distribution of failures?
-
What if you have 1000 disks
- How many will fail per year?
Annual Failure Rate (AFR) = 1000 disks * 8760 hrs/disk / 1200000 hrs/failure = 0.73%
-
-
Fallacies
-
Disk failure rates are as specified
-
Studies of failure rates in the field
- Schroeder and Gibson: 2% to 4% vs. 0.6% to 0.8%
- Pinheiro, et al.: 1.7% (first year) to 8.6% (third year) vs. 1.5%
- Why?
-
-
A 1GB/s interconnect transfers 1GB in one sec
- But what's a GB?
- For bandwidth, use 1GB = 10^9B
- For storage, use 1GB = 2^30B = 1.075 * 10^9B
-
So 1GB/sec is 0.93GB in one second
- About 7% error
-
-
Pitfall: Offloading to I/O Processors
-
Overhead of managing I/O processor request may dominate
- Quicker to do small operation on the CPU
- But I/O architecture may prevent that
-
I/O processor may be slower
- Since it's supposed to be simpler
-
Making it faster makes it into a major system component
- Might need its own coprocessors!
-
-
Pitfall: Backing Up to Tape
-
Magnetic tape used to have advantages
- Removable, high capacity
- Advantages eroded by disk technology developments
-
Makes better sense to replicate data
- E.g., RAID, remote mirroring
-
-
Fallacy: Disk Scheduling
-
Best to let the OS schedule disk accesses
-
But mordern drives deal with logical block addresses
- Map to physical track, cylinder, sector locations
- Also, blocks are cached by the drive
-
OS is unaware of physical locations
- Reordering can reduce performance
- Depending on placement and caching
-
-
-
Pitfall: Peak Performance
-
Peak I/O rates are nearly impossible to achieve
- Usually, some other system component limits performance
-
E.g., transfers to memory over a bus
- Collision with DRAM refresh
- Arbitration contention with other bus masters
-
E.g., PCI bus: peak bandwithd ~ 133 MB/sec
- In practice, max 80MB/sec sustainable
-
6.13 Concluding Remarks
-
Concluding Remarks
-
I/O performance measures
- Throughput, response time
- Dependability and cost also important
-
Buses used to connect CPU, memory, I/O controllers
- Polling, interrupts, DMA
-
I/O benchmarks
- TPC, SPECSFS, SPECWeb
-
RAID
- Improves performance and dependability
-